Cobb-Douglas Regression Model

This notebook provides the code for the Cobb-Douglas Regression model.

At the top of each notebook all neccessary libraries will be imported, before starting with the actual coding. \ Afterwards, functions are defined that will later be called to simulate the data, print out metrics and to create visualisations. \ The functions are the same for each model within a notebook file.

In [1]:
#importing all necessary libraries
import pandas as pd
import numpy as np
import random
import ipywidgets as widgets
from IPython.display import Javascript, display

Function for Data Simulation

In [2]:
def DataFunction(alpha, rho, intercept, n, min=0, max=10):
    """
    This function generates a dataset with three different variables.
    The variables ln(L) and ln(K) are randomly drawn from a uniform distribution and lie in a range between 0 and 10 by default.
    A seed is set to 0 to enable reproducibility.
    The varibale ln(Y) is then computed with the Translog function, using the randomly generated values and the parameters alpha, rho, and the intercept.
    Afterwards, a dictionary with all values is created.
    Applying the function returns a pandas.DataFrame object with n samples.
    """
    #setting the seed
    np.random.seed(0)
    
    #draw random values
    l_rand = np.random.uniform(min, max, n)
    k_rand = np.random.uniform(min, max, n)
    
    #computing the values for ln(Y) with the Translog production function
    y_TL = intercept + alpha*l_rand + (1-alpha)*k_rand - 1/2*rho*alpha*(1-alpha)*((k_rand-l_rand)**2)
    
    #create a dictionary with all variables
    TL_dict = {'ln(Y)': y_TL, 'ln(L)': l_rand, 'ln(K)': k_rand}
    
    return(pd.DataFrame(TL_dict))

Error term

In [3]:
def error_term(sigma, n, mu=0):
    """
    This function randomly draws n values from a normal distribution.
    When the function is called, the standard deviation and the number of values have to be defined.
    The mean of the distribution is 0 by default.
    Values are returned in form of a numpy.array.
    """
    np.random.seed(0)
    
    u = np.array(np.random.normal(mu, sigma, n))
    
    return(u)

Summary function

In [4]:
from sklearn import metrics

def summary(test_values, predicted_values):
    """
    This function computes the root mean sqared error (RMSE) and the mean absolute error (MAE).
    It uses the vaules form a test set and the fitted values to return a pandas.DataFrame object.
    """
    #computing the RMSE and the MAE with the respective functions from the sklearn library
    RMSE = (metrics.mean_squared_error(y_test, y_pred))**(0.5)
    MAE = metrics.mean_absolute_error(y_test, y_pred)
    
    #create a dictionary withe the metrics
    summary_dict = {'Metric': ['RMSE','MAE'],
                       'Value': [RMSE, MAE]}
    
    return(pd.DataFrame(summary_dict))

3D Plot function

In [5]:
import plotly.express as px
import plotly.graph_objects as go

def Plot_function(model, data):
    """
    This function visualises a regression model for a data set in a 3 dimensional space.
    The plotly package is used and enables interaction with the plot.
    """
    #defining the size of the mesh grid and the margins
    mesh_size = 0.09
    margin = 0
    
    #fitting the model to the exogenous and endogenous variables of the whole dataset.
    model.fit(X, y)
    
    #create a mesh grid to later run the model on
    x_min, x_max = X.min() - margin, X.max() + margin
    y_min, y_max = X.min() - margin, X.max() + margin
    xrange = np.arange(x_min, x_max, mesh_size)
    yrange = np.arange(y_min, y_max, mesh_size)
    xx, yy = np.meshgrid(xrange, yrange)
    
    #run model
    pred = model.predict(np.c_[xx.ravel(), yy.ravel()])
    pred = pred.reshape(xx.shape)

    #generate the plot
    fig = px.scatter_3d(data, x='ln(L)', y='ln(K)', z='ln(Y)')
    fig.update_traces(marker=dict(size=2))
    fig.add_traces(go.Surface(x=xrange, y=yrange, z=pred, name='pred_surface'))
    fig.show()

Heatmap function

In [6]:
#To prepare the grid of the heatmap, a list is created to represent the pixels on the grid.
#This double loop creates a list of value combinations between 0 and 10
liste = []
i = 0

while i <= 10:
    j = 0
    while j <= 10:
        liste.append([i,j])
        j += 0.1
    i += 0.1
In [7]:
#The previously created list is then transformed into a numpy.array
data = np.asarray(liste)
#A pandas.DataFrame object is then created  the generated grid values are assigned to the Variables ln(L) and ln(K)
columns = ['ln(L)', 'ln(K)']
df_heatmap = pd.DataFrame(data = data, columns = columns)
In [8]:
import plotly.express as px

def Heatmap_function(model):
    '''
    Returns a heatmap of the given regression model for values between 0 and 10.
    '''
    #defining the grid values
    X_heatmap = df_heatmap[['ln(L)','ln(K)']].values
    
    #compute predictions of the model
    y_heatmap = model.predict(X_heatmap)
    
    #create a pandas.DataFrame with the grid values and the fitted values
    df_heatmap['predicted ln(Y)'] = y_heatmap
    heatmap = df_heatmap.pivot('ln(L)','ln(K)','predicted ln(Y)')
    
    #displaying the heatmap
    fig = px.imshow(heatmap,labels=dict(color="predicted value"))
    fig.update_yaxes(autorange=True)
    fig.show()

Cobb-Douglas Regression Model

This is where the actual regression model begins.\ The user has to define values for the given parameters by adjusting the values of the sliders and the input cell. \ Do not execute the cell, because it will reset the parameters to their defaults. Afterwards, press the "Run all cells below" button to execute the code below with the desired set of parameters.

As mentioned in the written part of the thesis, the scikit-learn library by Pedregosa et al.(2011) is used to code the different regression models. \ The documentation of the library can be accessed with the following link: https://scikit-learn.org/stable/ \ Functions that are taken from this package will be explained, when they occur.

In [9]:
intercept_slider = widgets.FloatSlider(value=0.1, min=0.1, max=1, step=0.1, description='Intercept')
alpha_slider = widgets.FloatSlider(value=0.5, min=0.5, max=1, step=0.1, description='α')

rho_slider = widgets.FloatSlider(value=0, min= 0, max= 1 , step=0.1, description='ρ')
sigma_slider = widgets.FloatSlider(value=1, min=0.5, max=1.5, step=0.25, description='σ')
n_input = widgets.IntText(value = 125, description = 'Samples')

display(intercept_slider,alpha_slider,rho_slider,sigma_slider, n_input)
In [10]:
def run_all(ev):
    display(Javascript('IPython.notebook.execute_cell_range(IPython.notebook.get_selected_index()+1, IPython.notebook.ncells())'))

button = widgets.Button(description="Run all cells below")
button.on_click(run_all)
display(button)

Regression on the Dataset

In [11]:
#calling the data function on the parameter values and assign it to the variable data
data = DataFunction(alpha = alpha_slider.value,
                    intercept = intercept_slider.value,
                    n = n_input.value,
                    rho = rho_slider.value)
In [12]:
#getting the first 5 rows of the DataFrame
data.head()
Out[12]:
ln(Y) ln(L) ln(K)
0 5.876034 5.488135 6.063932
1 3.771913 7.151894 0.191932
2 4.621691 6.027634 3.015748
3 6.125284 5.448832 6.601735
4 3.668662 4.236548 2.900776
In [13]:
#adding the error term to the values of the endogenous variable
data['ln(Y)'] += error_term(sigma = sigma_slider.value,
                            n = n_input.value)
In [14]:
#getting the first 5 rows of the new DataFrame
data.head()
Out[14]:
ln(Y) ln(L) ln(K)
0 7.640086 5.488135 6.063932
1 4.172070 7.151894 0.191932
2 5.600429 6.027634 3.015748
3 8.366177 5.448832 6.601735
4 5.536220 4.236548 2.900776
In [15]:
#assigning the exogenous variables to X and the endogenous variable to y
X = data[['ln(L)','ln(K)']].values
y = data[['ln(Y)']].values

The train_test_split function splits vectors and matrices into test and training set.\ In this case the matrix X and the vector y are split.\ The test_size is set to 25% of the whole data.\ random_state = 0 sets a random seed to make the splitted sets reproducible, since the data is shuffled before splitting (Pedregosa et al. ,2011).

In [16]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

Next, the regression model is initiated using the LinearRegression class.\ The LinearRegression class initiates a linear regression model, using the ordinary least squares method. Afterwards, the model is fit to the training data. (Pedregosa et al. , 2011)

In [17]:
from sklearn.linear_model import LinearRegression
LinearReg = LinearRegression()
LinearReg.fit(X_train, y_train)
Out[17]:
LinearRegression()
In [18]:
#accessing the estimated coefficients of the model

intercept    = LinearReg.intercept_[0]
coefficients = LinearReg.coef_[0,:]

print('Intercept:',round(intercept,4))
print('Coefficients:',round(coefficients[0],4),',',round(coefficients[1],4))
print('Sum of coefficients:',round(coefficients[0],4)+round(coefficients[1],4))
Intercept: 0.2883
Coefficients: 0.4894 , 0.4865
Sum of coefficients: 0.9759
In [19]:
#use the fitted model to predict values of the endogeneous variables for the test set
y_pred = LinearReg.predict(X_test)
In [20]:
#print RMSE and MAE, using the summary function
summary(test_values=y_test, predicted_values=y_pred)
Out[20]:
Metric Value
0 RMSE 1.05093
1 MAE 0.81912

3D Regression Plot and Heatmap

In [21]:
Plot_function(model=LinearReg, data=data)
In [22]:
Heatmap_function(model=LinearReg)